$ \ mathbf {perive} $:使用人工智能(AI)到:(1)从相对较大的人群中利用视神经头(ONH)的生物力学知识; (2)评估ONH的单个光学相干断层扫描(OCT)扫描的稳健性; (3)确定哪些关键的三维(3D)结构特征使给定的ONH稳健。 $ \ Mathbf {Design} $:回顾性横断面研究。 $ \ mathbf {Methods} $:316个受试者通过Ophthalmo-Dynamometry在急性眼内和之后与OCT成像。然后将IOP诱导的椎板胶状变形映射为3D,并用于对ONH进行分类。 LC变形高于4%的人被认为是脆弱的,而变形较低的人则较低4%。从这些数据中学习,我们比较了三种AI算法,以严格地从基线(未呈现的)OCT卷中预测鲁棒性:(1)随机森林分类器; (2)自动编码器; (3)动态图CNN(DGCNN)。后一种算法还使我们能够确定哪些关键的3D结构特征使给定的智能稳定。 $ \ mathbf {结果} $:所有3种方法都能够单独预测3D结构信息的稳健性,而无需执行生物力学测试。 DGCNN(接收器操作曲线下的区域[AUC]:0.76 $ \ pm $ 0.08)的表现优于自动编码器(AUC:0.70 $ \ pm $ 0.07)和随机森林分类器(AUC:0.69 $ \ pm $ 0.05)。有趣的是,为了评估稳健性,DGCNN主要使用了巩膜和LC插入部位的信息。 $ \ mathbf {结论} $:我们提出了一种AI驱动的方法,可以仅从ONH的单个OCT扫描中评估给定ONH的稳健性,而无需进行生物力学测试。纵向研究应确定ONH鲁棒性是否可以帮助我们确定快速的视野损失进展者。
translated by 谷歌翻译
目的:(1)开发深度学习算法,以识别3D光学相干断层扫描(OCT)扫描中的视神经头(ONH)的主要组织结构; (2)利用这些信息在健康,光盘博森(奇数)和乳头膜ONHS之间鲁棒地区分。由于高颅内压(51只眼)和健康对照(100只眼睛),这是一种横截面对比研究,由于高颅内压(51只眼睛),以及健康的对照(100只眼)。使用OCT获得ONH的3D扫描,然后加工以改善深层组织可见性。首先,使用984 B-Scans(从130只眼睛)开发了深度学习算法,以识别:主要的神经/结缔组织和奇数区域。使用骰子系数(DC)评估我们的算法的性能。在第2步骤中,使用1500Ct卷设计了一个分类算法(随机林),以严格从其德鲁森和普拉拉马那肿胀得分(来自细分)来执行3级分类(1:奇数,2:Papilledema,3:健康) )。为了评估性能,我们报告了每个类的接收器操作特征曲线(AUC)下的区域。我们的分割算法能够在存在时隔离神经和结缔组织和奇数区域。这是在测试集上的平均DC为0.93 $ 0.03的平均直流,相应于良好性能。分类是用高AUC的分类,即检测奇数,0.99美元0.01 0.01美元,用于检测Papilledema的0.99美元,0.98美元$ 0.02用于检测健康的ONH。我们的AI方法可以使用单个OCT扫描来准确地歧视奇数乳头。我们的分类表现非常出色,有需要在更大的人口中验证。我们的方法可能有可能建立10月作为神经眼科诊断成像的主干。
translated by 谷歌翻译
目的:评估中央视网膜血管躯干及其分支(CRVT&B)的三维(3D)结构构型是否可用作青光眼的诊断标志物。方法:我们训练了深度学习网络,从光神经头(ONH)的光学相干断层扫描(OCT)体积的B-Scans自动分割CRVT&B。随后,使用从OCT体积中提取的CRVT&B的结构构型,两种不同的方法用于青光眼诊断。在第一种方法中,我们旨在仅使用CNN的3D CNN和CRVT&B的3D结构提供诊断。对于第二种方法,我们将CRVT&B的3D结构投射到三个平面上以获得2D图像,然后使用2D CNN进行诊断。使用骰子系数评估分割精度,而使用接收器操作特性曲线(AUC)下的区域评估诊断准确度。 CRVT&B的诊断性能也与视网膜神经纤维层(RNFL)厚度进行了比较。结果:我们的分割网络能够从OCT扫描有效地分段视网膜血管。在测试集上,我们实现了0.81 \ PM0.07的骰子系数。 3D和2D诊断网络能够将青光眼与非青光眼受试者区分别分别区分82.7%和83.3%的精度。 CRVT&B的相应AUC为0.89和0.90,高于用RNFL厚度获得的0.90℃。结论:我们的工作表明,CRVT&B的诊断功能优于金标 - 标准的青光眼参数,即RNFL厚度。我们的作品还建议主要视网膜血管形成骨架 - 其配置可以代表主要的ONH结构变化,通常观察到青光眼的开发和进展。
translated by 谷歌翻译
In this paper, we propose a novel framework dubbed peer learning to deal with the problem of biased scene graph generation (SGG). This framework uses predicate sampling and consensus voting (PSCV) to encourage different peers to learn from each other, improving model diversity and mitigating bias in SGG. To address the heavily long-tailed distribution of predicate classes, we propose to use predicate sampling to divide and conquer this issue. As a result, the model is less biased and makes more balanced predicate predictions. Specifically, one peer may not be sufficiently diverse to discriminate between different levels of predicate distributions. Therefore, we sample the data distribution based on frequency of predicates into sub-distributions, selecting head, body, and tail classes to combine and feed to different peers as complementary predicate knowledge during the training process. The complementary predicate knowledge of these peers is then ensembled utilizing a consensus voting strategy, which simulates a civilized voting process in our society that emphasizes the majority opinion and diminishes the minority opinion. This approach ensures that the learned representations of each peer are optimally adapted to the various data distributions. Extensive experiments on the Visual Genome dataset demonstrate that PSCV outperforms previous methods. We have established a new state-of-the-art (SOTA) on the SGCls task by achieving a mean of \textbf{31.6}.
translated by 谷歌翻译
Audio-Visual scene understanding is a challenging problem due to the unstructured spatial-temporal relations that exist in the audio signals and spatial layouts of different objects and various texture patterns in the visual images. Recently, many studies have focused on abstracting features from convolutional neural networks while the learning of explicit semantically relevant frames of sound signals and visual images has been overlooked. To this end, we present an end-to-end framework, namely attentional graph convolutional network (AGCN), for structure-aware audio-visual scene representation. First, the spectrogram of sound and input image is processed by a backbone network for feature extraction. Then, to build multi-scale hierarchical information of input features, we utilize an attention fusion mechanism to aggregate features from multiple layers of the backbone network. Notably, to well represent the salient regions and contextual information of audio-visual inputs, the salient acoustic graph (SAG) and contextual acoustic graph (CAG), salient visual graph (SVG), and contextual visual graph (CVG) are constructed for the audio-visual scene representation. Finally, the constructed graphs pass through a graph convolutional network for structure-aware audio-visual scene recognition. Extensive experimental results on the audio, visual and audio-visual scene recognition datasets show that promising results have been achieved by the AGCN methods. Visualizing graphs on the spectrograms and images have been presented to show the effectiveness of proposed CAG/SAG and CVG/SVG that could focus on the salient and semantic relevant regions.
translated by 谷歌翻译
With the ever-growing popularity of the field of NLP, the demand for datasets in low resourced-languages follows suit. Following a previously established framework, in this paper, we present the UNER dataset, a multilingual and hierarchical parallel corpus annotated for named-entities. We describe in detail the developed procedure necessary to create this type of dataset in any language available on Wikipedia with DBpedia information. The three-step procedure extracts entities from Wikipedia articles, links them to DBpedia, and maps the DBpedia sets of classes to the UNER labels. This is followed by a post-processing procedure that significantly increases the number of identified entities in the final results. The paper concludes with a statistical and qualitative analysis of the resulting dataset.
translated by 谷歌翻译
Deep neural networks have strong capabilities of memorizing the underlying training data, which can be a serious privacy concern. An effective solution to this problem is to train models with differential privacy, which provides rigorous privacy guarantees by injecting random noise to the gradients. This paper focuses on the scenario where sensitive data are distributed among multiple participants, who jointly train a model through federated learning (FL), using both secure multiparty computation (MPC) to ensure the confidentiality of each gradient update, and differential privacy to avoid data leakage in the resulting model. A major challenge in this setting is that common mechanisms for enforcing DP in deep learning, which inject real-valued noise, are fundamentally incompatible with MPC, which exchanges finite-field integers among the participants. Consequently, most existing DP mechanisms require rather high noise levels, leading to poor model utility. Motivated by this, we propose Skellam mixture mechanism (SMM), an approach to enforce DP on models built via FL. Compared to existing methods, SMM eliminates the assumption that the input gradients must be integer-valued, and, thus, reduces the amount of noise injected to preserve DP. Further, SMM allows tight privacy accounting due to the nice composition and sub-sampling properties of the Skellam distribution, which are key to accurate deep learning with DP. The theoretical analysis of SMM is highly non-trivial, especially considering (i) the complicated math of differentially private deep learning in general and (ii) the fact that the mixture of two Skellam distributions is rather complex, and to our knowledge, has not been studied in the DP literature. Extensive experiments on various practical settings demonstrate that SMM consistently and significantly outperforms existing solutions in terms of the utility of the resulting model.
translated by 谷歌翻译
Inverse medium scattering solvers generally reconstruct a single solution without an associated measure of uncertainty. This is true both for the classical iterative solvers and for the emerging deep learning methods. But ill-posedness and noise can make this single estimate inaccurate or misleading. While deep networks such as conditional normalizing flows can be used to sample posteriors in inverse problems, they often yield low-quality samples and uncertainty estimates. In this paper, we propose U-Flow, a Bayesian U-Net based on conditional normalizing flows, which generates high-quality posterior samples and estimates physically-meaningful uncertainty. We show that the proposed model significantly outperforms the recent normalizing flows in terms of posterior sample quality while having comparable performance with the U-Net in point estimation.
translated by 谷歌翻译
Coverage path planning is a major application for mobile robots, which requires robots to move along a planned path to cover the entire map. For large-scale tasks, coverage path planning benefits greatly from multiple robots. In this paper, we describe Turn-minimizing Multirobot Spanning Tree Coverage Star(TMSTC*), an improved multirobot coverage path planning (mCPP) algorithm based on the MSTC*. Our algorithm partitions the map into minimum bricks as tree's branches and thereby transforms the problem into finding the maximum independent set of bipartite graph. We then connect bricks with greedy strategy to form a tree, aiming to reduce the number of turns of corresponding circumnavigating coverage path. Our experimental results show that our approach enables multiple robots to make fewer turns and thus complete terrain coverage tasks faster than other popular algorithms.
translated by 谷歌翻译
Many Click-Through Rate (CTR) prediction works focused on designing advanced architectures to model complex feature interactions but neglected the importance of feature representation learning, e.g., adopting a plain embedding layer for each feature, which results in sub-optimal feature representations and thus inferior CTR prediction performance. For instance, low frequency features, which account for the majority of features in many CTR tasks, are less considered in standard supervised learning settings, leading to sub-optimal feature representations. In this paper, we introduce self-supervised learning to produce high-quality feature representations directly and propose a model-agnostic Contrastive Learning for CTR (CL4CTR) framework consisting of three self-supervised learning signals to regularize the feature representation learning: contrastive loss, feature alignment, and field uniformity. The contrastive module first constructs positive feature pairs by data augmentation and then minimizes the distance between the representations of each positive feature pair by the contrastive loss. The feature alignment constraint forces the representations of features from the same field to be close, and the field uniformity constraint forces the representations of features from different fields to be distant. Extensive experiments verify that CL4CTR achieves the best performance on four datasets and has excellent effectiveness and compatibility with various representative baselines.
translated by 谷歌翻译